Lec01 - Mon 2/13: Introduction
Course Title
- In catalog: Introduction to Statistical Sciences
- New: Introduction to Statistical and Data Sciences
Data Science
- Example domains: biology, economics, physics, sociology, etc.
- So why the title switch?
Course Objective #1
Have students engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices.
- Cobb: Minimizing prerequisites to research
- Not necessarily publishing in top journals, but answering scientific questions with data.
- Difficult to do research without understanding stats, however
Data/Science Research Pipeline
We will, as best we can, perform all this:
Data/Science Research Pipeline
And not just this, as in many previous intro stats courses:
Course Objective #2
Foster a conceptual understanding of statistical topics and methods using simulation/resampling and real data whenever possible, rather than mathematical formulae.
- Whenever we can, use real data
- Example data set: nycflights13
- There are two “engines” that can make statistics “work”
- Mathematics: formulas, approximations, etc
- Computers: simulations, random number generation
The “Engine” of Statistics
In this course, computers and not math will be the “engine”. What does this mean?
- Less of this:

- But more of this:

Programming/Coding
- Previous programming/coding experience is not a prerequisite to this course
- This course is not an explicit course on programming, coding, nor computer science. But we will use some elements.
- Also you will be exposed to basic algorithmic thinking and computational logic
- Learning R is like learning a foreign language: its really hard at first!
Two Simple Rules of Learning Code
- Computers are stupid!
- When learning, take existing code that works, and tweak it!
Course Objective #3
Blur the traditional lecture/lab dichotomy of introductory statistics courses by incorporating more computational and algorithmic thinking into the syllabus.
- Completely separate lecture and labs is a legacy of a time before

RStudio Server
- Not all laptops are created equal: operating system, processing power, age
- RStudio Server: cloud-based version of RStudio where all processing is done on Middlebury servers
go/rstudio/ (on campus or via VPN)
Course Objective #5
Develop statistical literacy by, among other ways, tying in the curriculum to current events, demonstrating the importance statistics plays in society.
- H.G. Wells (paraphrased): “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.”
- Me: “Sure, it’s easy to lie with statistics. But it’s also hard to tell the truth without them.”
Final Project
- Capstone experience to align this topics and principles of this course with how research and learning is done in practice.
- Work on interpersonal and collaborative skills. No textbook on that!
R, RStudio, and DataCamp
- R: Software behind the scenes i.e. the engine
- RStudio: Intergrated development environment i.e. the interface
- DataCamp: Browser-based learning tool i.e. the driver’s ed teacher
Test Drive RStudio
- Login to
go/rstudio/ with your Midd account
- If you don’t have access, raise your hand. (Username: guest1, password: rstudioguest)
- In RStudio menu bar -> File -> New File -> R Script
The Four Panels
- Console: Crunch numbers in R
- Files, Packages, Help: See your files, install packages, help files
- Editor: Where you’ll write code and save it
- Environment: Your workspace
Important: Console
- This is where you run/execute commands
- The “>” is the prompt. It means R is ready to receive commands
- If you don’t see a “>” and want to restart, press ESC.
Switching Gears
Now we will use R via DataCamp instead of via RStudio, but just for driver’s ed. Two panels exist in both:
- Editor panel: Where you write code
- Console panel: Where you will execute code